MersV1, Main, Exploration, bibRecord, 000492

MSC: a metagenomic sequence classification algorithm.

Identifieur interne : 000492 ( Main/Exploration ); précédent : 000491; suivant : 000493

MSC: a metagenomic sequence classification algorithm.

Auteurs : Subrata Saha [États-Unis] ; Jethro Johnson [États-Unis] ; Soumitra Pal [États-Unis] ; George M. Weinstock [États-Unis] ; Sanguthevar Rajasekaran [États-Unis]

Source :

Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2019.

RBID : pubmed:30649204

Abstract

Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.

DOI: 10.1093/bioinformatics/bty1071
PubMed: 30649204

Affiliations:

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">MSC: a metagenomic sequence classification algorithm.</title>
<author><name sortKey="Saha, Subrata" sort="Saha, Subrata" uniqKey="Saha S" first="Subrata" last="Saha">Subrata Saha</name>
<affiliation wicri:level="2"><nlm:affiliation>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Johnson, Jethro" sort="Johnson, Jethro" uniqKey="Johnson J" first="Jethro" last="Johnson">Jethro Johnson</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation wicri:level="2"><nlm:affiliation>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD</wicri:regionArea>
<placeName><region type="state">Maryland</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Weinstock, George M" sort="Weinstock, George M" uniqKey="Weinstock G" first="George M" last="Weinstock">George M. Weinstock</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Engineering Department, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:30649204</idno>
<idno type="pmid">30649204</idno>
<idno type="doi">10.1093/bioinformatics/bty1071</idno>
<idno type="wicri:Area/PubMed/Corpus">000670</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000670</idno>
<idno type="wicri:Area/PubMed/Curation">000670</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000670</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000488</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000488</idno>
<idno type="wicri:Area/Ncbi/Merge">002081</idno>
<idno type="wicri:Area/Ncbi/Curation">002081</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">002081</idno>
<idno type="wicri:Area/Main/Merge">000495</idno>
<idno type="wicri:Area/Main/Curation">000492</idno>
<idno type="wicri:Area/Main/Exploration">000492</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">MSC: a metagenomic sequence classification algorithm.</title>
<author><name sortKey="Saha, Subrata" sort="Saha, Subrata" uniqKey="Saha S" first="Subrata" last="Saha">Subrata Saha</name>
<affiliation wicri:level="2"><nlm:affiliation>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Healthcare and Life Sciences Division, IBM Thomas J. Watson Research Center, Yorktown Heights, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Johnson, Jethro" sort="Johnson, Jethro" uniqKey="Johnson J" first="Jethro" last="Johnson">Jethro Johnson</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation wicri:level="2"><nlm:affiliation>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD</wicri:regionArea>
<placeName><region type="state">Maryland</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Weinstock, George M" sort="Weinstock, George M" uniqKey="Weinstock G" first="George M" last="Weinstock">George M. Weinstock</name>
<affiliation wicri:level="2"><nlm:affiliation>The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The Jackson Laboratory for Genomic Medicine, Farmington, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2"><nlm:affiliation>Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Engineering Department, University of Connecticut, Storrs, CT</wicri:regionArea>
<placeName><region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Connecticut</li>
<li>Maryland</li>
<li>État de New York</li>
</region>
</list>
<tree><country name="États-Unis"><region name="État de New York"><name sortKey="Saha, Subrata" sort="Saha, Subrata" uniqKey="Saha S" first="Subrata" last="Saha">Subrata Saha</name>
</region>
<name sortKey="Johnson, Jethro" sort="Johnson, Jethro" uniqKey="Johnson J" first="Jethro" last="Johnson">Jethro Johnson</name>
<name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<name sortKey="Weinstock, George M" sort="Weinstock, George M" uniqKey="Weinstock G" first="George M" last="Weinstock">George M. Weinstock</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000492 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000492 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:30649204
   |texte=   MSC: a metagenomic sequence classification algorithm.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:30649204" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

Serveur d'exploration MERS

MSC: a metagenomic sequence classification algorithm.

MSC: a metagenomic sequence classification algorithm.

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.